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Abstract 

This paper proposes a simple procedure to decide whether the empirically-observed adjacency or 
weights matrix, which characterizes the graph underlying a socio-economic network, is sufficiently 
symmetric (respectively, asymmetric) to justify an undirected (respectively, directed) network anal- 
ysis. We introduce a new index that satisfies two main properties. First, it can be applied to both 
binary or weighted graphs. Second, once suitably standardized, it distributes as a standard nor- 
mal over all possible adjacency /weights matrices. To test the index in practice, we present an 
application that employs a set of well-known empirically-observed social and economic networks. 
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I. INTRODUCTION 



In the last years, the literature on networks has been characterized by exponential growth. 
Empirical and theoretical contributions in very diverse fields such as physics, sociology, 
economics, etc. have increasingly highlighted the pervasiveness of networked structures. 
Examples range from WWW, the Internet, airline connections, scientific collaborations and 
citations, trade and labor market contacts, friendship and other social relationships, business 
relations and R&S partnerships, all the way through cellular, ecological and neural networks 

nasi. 

The empirical research has thoroughly studied the (often complex) topological properties 
of such networks, whereas a large number of theoretical models has been proposed in order 
to investigate how networks evolve through time Structural properties of networks 
have been shown to heavily impact on the dynamics of the socio-economic systems that 
they embed Jy] . As a result, their understanding has become crucial also as far as policy 
implications are concerned [7|. 

The simplest mathematical description of a network is in terms of a graph, that is a 
list of nodes {1,2, N} and a set of arrows (links), possibly connecting any two nodes 
@, Q. Alternatively, one can characterize a network through a N x N real-valued matrix 
W = {wij}, where any out-of-diagonal entry is non-zero if and only if an arrow from 
node i to j exists in the network. Entries on the main diagonal are typically assumed to be 
all different from zero (if self-interactions are allowed) or all equal to zero (if they are not). 
Networks are distinguished in binary (dichotomous) or weighted. In binary networks all 
links carry the same intensity. This means that in binary networks a link is either present or 
not, i.e. Wij G {0, 1}. In this case, W is called an "adjacency" matrix. Weighted networks 
allow one instead to associate a weight (i.e. a positive real number) to each link, typ ically 



proportional to its interaction strength or the flux intensity it carries [10l . I 111 . |12| . |13| . Any 
non-zero entry thus measures the weight of the link originating from i and ending up in 
j, and the resulting matrix W is called the "weights" matrix 18 . 



Both binary and weighted networks can be undirected or directed. Formally, a network is 
undirected if all links are bilateral, i.e. WijWji > for all i ^ j. This means that in undirected 
networks all pairs of connected nodes mutually affect each other. One can thus replace 
arrows with non-directed edges (or arcs) connecting any two nodes and forget about the 
implicit directions. This greatly simplifies the analysis, as the tools for studying undirected 
networks are much better developed and understood. Directed networks are instead not 
symmetric, as there exists at least a pair of connected nodes wherein one directed link is not 
reciprocated, i.e. ^ j : > 0, but Wji = 0. Studying the topological properties 

of directed networks, especially in the weighted case, can become more difficult, as one has 
to distinguish inward from outward links in computing synthetic indices such as node and 
average nearest-neighbor degree and strength, clustering coefficient, etc.. Therefore, it is 
not surprising that the properties of such indices are much less explored in the literature. 

From a theoretic perspective, it is easy to distinguish undirected from directed networks: 
the network is undirected if and only if the matrix W is symmetric. When it comes to the 
empirics, however, researchers often face the following problem. If the empirical network 
concerns an intrinsically mutual social or economic relationship (e.g. friendship, marriage, 
business partnerships, etc.) then W, as estimated by the data collected, is straightforwardly 
symmetric and only tools designed for undirected network analysis are to be employed. More 
generally, however, one deals with notionally non-mutual relationships, possibly entailing 
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directed networks. In that case, data usually allow to build a matrix W that, especially in 
the weighted case, is hardly found to be symmetric. Strictly speaking, one should treat all 
such networks as directed. This often implies a more complicated and convoluted analysis 
and, frequently, less clear-cut results. The alternative, typically employed by practitioners in 
the field, is to compute the ratio of the number of directed (bilateral) links actually present 
in the networks to the maximum number of possible directed links (i.e. iV(iV — 1)). If this 
ratio is "reasonably" large, then one can symmetrize the network (i.e. making it undirected, 
see 0, lH) and apply the relevant undirected network toolbox. 

However, as shown in Ref . [l5[ , this procedure has several drawbacks. In particular, it is 
heavily dependent on the density of the network under analysis (i.e., the ratio between the 
total number of existing links to N(N — 1)). 

Moreover, and most important here, if the network is weighted, the ratio of bilateral links 
does not take into account the effect of link weights. Indeed, a bilateral link exists between 
i and j if and only if WijWji > 0, i.e. irrespective of the actual size of the two weights. Of 
course, as far as symmetry of W is concerned, the sub-case where w y » 0, Wji ~ will be 
very different from the sub-case where Wij ~ Wji > 0. 

In this paper, we present a simple procedure that tries to overcome this problem. More 
specifically, we develop a simple index that can help in deciding when the empirically- 
observed W is sufficiently symmetric to justify an undirected network analysis. Our index 
has two main properties. First, it can be applied with minor modifications to both binary and 
weighted networks. Second, the standardized version of the index distributes as a standard 
normal (over all possible matrices W). Therefore, after having set a threshold x, one might 
conclude that the network is to be treated as if it is undirected if the index computed on W 
is lower than x. 

Of course, the procedure that we propose in the paper is by no means a statistical test for 
the null hypothesis that W involves some kind of symmetry. Indeed, one has almost always 
to rely on a single observation for W (more on that in Section |V]) . Nevertheless, we believe 
that the index studied here could possibly provide a simple way to ground the "directed vs. 
undirected" decision on more solid bases. 

The paper is organized as follows. In Section [Til we define the index and we derive its basic 
properties. Section [TTT] discusses its statistical properties, while in Section [TV] we apply the 
procedure to the empirical networks extensively studied in Finally, Section IVl concludes. 



II. DEFINITION AND BASIC PROPERTIES 



Consider a directed, weighted, graph G = (N,A), where N is the number of nodes and 
A = {a i:j } is the N x N (real- valued) matrix of link "weights" 0, Q H E3] • Without loss 



of generality, we can assume e [0, 1], Vi ^ j and an = a £ {0, j — 1, . . . N [19]. In 
line with social network analysis, we interpret the generic out-of-diagonal entry a,ij,i ^ j, 
as the weight associated to the directed link originating from node i and ending up in node 
j (i.e., the strength of the directed link i — > j in the graph). A directed edge from i to j is 
present if and only if a^- > 0. 

The idea underlying the construction of the index is very simple. If the graph G is 
undirected, then A = A T , where A T is the transpose of A. Denoting by || ■ || any norm 
defined on a square-matrix, the extent to which directionality of links counts in the graph 
G can therefore be measured by some increasing function of II A — A T \\, suitably rescaled by 
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some increasing function of \\A\\ (and possibly of ||A T ||). 

To build the index we first define, again without loss of generality: 



A = {aij} = A - (1 - a) I 



(1) 

where In is the Nx N identity matrix. Accordingly, we define the graph G = (N, A). Notice 



that a 



">.i 



for all i j£ j, while now an = 1 for all i. 



Consider then the square of the Frobenius (or Hilbert-Schmidt) norm: 



I4II 2 



EE4 



i j^i 



(2) 



where all sums (also in what follows) span from 1 to N. Notice that ||A||fi is invariant with 
respect to the transpose operator, i.e. \\A\\p = \\A t \\f. 
We thus propose the following index: 



S(A) 
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By exploiting the symmetry of (ay 

E*E 



S(A) 
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a 



one easily gets: 
^2 



\A — A 



-i 2 



A 
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Alternatively, by expanding the squared term at the numerator, we obtain: 



(3) 



(4) 



-/V + 2 . . aj~a~; y^ • yE , . a?- — 2 y^ . y^ . a^a,-,- 

•v • E,E,„4 -v • E,E. ; „4 ' u 

The index S'(A) has a few interesting properties, which we summarize in the following: 

Lemma 1 (General properties of S) For all real-valued N x N matrices A = {ay} s.t. 
ay G [0, 1], i ^ j and an — 1, i — 1, iV ; t/ien: 

(%) 5(A) > 0. 

5(A) = -v=> A = A T , i.e. if and only if the graph is undirected. 

(3) S(A) < 

Proof. See Appendix [A] ■ 

Furthermore, when G is binary (i.e., ay G {0,1} for all the index in eq. [3] turns 
out to be closely connected to the density of the graph (i.e., the ratio between the total 
number of directed links to the maximum possible number of directed links) and the ratio 
of the number of bilateral directed links in G (i.e. links from i to j s.t. ay = a^ = 1) to the 
maximum possible number of directed links. More precisely: 
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Lemma 2 (Properties of S in the case of binary graphs) When G is binary, i.e. 
ay £ {0,1}, alli,j, then: 

~ d(A)-b(A) 

b[A) ~ (N-l)-i + d(AY {7) 

where d(A) is the density of G and b(A) is the ratio between the number of bilateral directed 
links to the maximum number of directed links. 

Proof. See Appendix [B] ■ 

Notice that, in the case of undirected graphs, b(A) = d(A) and S(A) = 0. On the 
contrary, when there are no bilateral links, b(A) = 0. Hence, S(A) = [d(A)/(N — 1) + l] -1 , 
which is maximized when d(A) = |, i.e. S(A) = ^tj, as shown in Lemma [TJ Obviously, the 
larger b(A), the more the graph G is undirected. As mentioned in Section (TJ b(A) can be 
employed to check for the extent to which directionality counts in G. However, such index 
is not very useful in weighted graphs, as it does not take into account the size effect (i.e. 
the size of weights as measured by ay £ [0, 1]). 

In the case of binary graphs, the index in ([3]) almost coincides with the one proposed 
in [TBI] . They suggest to employ the correlation coefficient between (ay, a^) entries in the 
adjacency matrix (excluding self loops). It can be shown that this alternative index - unlike 
the one in (J3]) - increases with the number of reciprocated links, only if both the total 
number of links that are in place and the density of the network remain constant. See also 
(lit , 17] for applications. 



Since S(A) £ [0, ^frj], in what follows we shall employ its rescaled version: 



S(A) = ^S(A), (8) 
which ranges in the unit interval and thus has a more straightforward interpretation. 



III. STATISTICAL PROPERTIES 

In this section we study the distribution of the index S as defined in eqs. [3] and [HJ Indeed, 
despite the range of S does not depend on N, we expect its distribution to be affected by: 
(i) the size of the matrix (A); (ii) whether the underlying graph G is binary (ay £ {0, 1}) 
or weighted (ay £ [0, 1]). 

To do so, for each N £ {5, 10, 50, 100, 200, 500, 700, 1000} we generate M = 100, 000 
random matrices A obeying the restriction that an = 1, all i. In the binary case, out-of- 
diagonal entries {ay, i ^ j} are drawn from i.i.d. Bernoulli random variables with pro6{ay = 
0} = prob{aij = 1} = 0.5. In the weighted case, entries ay are i.i.d. random variables 
uniformly-distributed over [0, 1]. We then estimate the distributions of S in both the binary 
and the weighted cases, and we study their behavior as the size of the graph increases. Let 
us denote by mfl(JV) (respectively, mw(N)) the sample mean of the index S in the binary 
(respectively, weighted) case, and by sb(N) (respectively, sw(N)) the sample standard 
deviation of the index S in the binary (respectively, weighted) case. Simulation results are 
summarized in the following points. 

1. In both the binary and the weighted case, the index S approximately distributes as 
a Beta random variable for all N. As N increases, ms(iV) decreases towards 0.50 
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whereas mw(N) increases towards 0.25. Both standard deviations decrease towards 
0. More precisely, the following approximate relations hold (see Figures [1] and [2]) : 



m B (N) ~ 0.50 + exp{-l. 786369 
m w (N) ~ 0.25 - exp{-l. 767551 
SjB (iV) ~ exp{-0. 135458 
s w {N) ~ exp{-0.913297 



1.680938/niV} (9) 

0.937586/niV} (10) 

1.001695/niV} (11) 

0.982570/nA^} (12) 




FIG. 1: Binary Graphs. Sample mean 
and standard deviation of S vs. N, to- 
gether with OLS fits. Log-scale on both 
axes. OLS fits: ln[m B (N) - 0.50] ~ 
-1.786369 - 1.680938!nJV (R 2 = 0.998934) 
and ln[s B (N)] ~ -0.135458 - 1.001695/niV 
{R 2 = 0.999995). 



FIG. 2: Weighted Graphs. Sample mean 
and standard deviation of S vs. N, to- 
gether with OLS fits. Log-scale on both 
axes. OLS fits: /n[0.25 - m w (N)] ~ 
-1.767551 - 0.937586£nTV (R 2 = 0.998966) 
and ln[s w (N)] ~ -0.913297 - 0.982570/niV 
(R 2 = 0.999932). 



2. Given the approximate relations in eqs. l9"]fT2"| let us standardize the index S as follows: 



_ S(A)-m B (N) 
Sb{A) ~ TbW) ' ( } 

S W (A) = 8 < A) -™ W . (M) 
s w {l\ ) 

Simulations indicate that the standardized versions of the index, i.e. S B and Sw, are 
both well approximated by a N(0, 1), even for small iVs (N > 10). Indeed, as Figures 
[3] and 0] show, the mean of the distributions of S B and Sw vs. converges towards 
zero, while the standard deviation approaches one (we actually plot standard deviation 
minus one to have a plot in the same scale). Also the third (skewness) and the fourth 



6 



moment (excess kurtosis) stay close to zero. We also plot the estimated distribution 
of Sb and Sw vs. N, see Figures |5] and [61 It can be seen that all estimated densities 
collapse towards a iV(0, 1). Notice that the y-axis is in log scale: this allows one to 
appreciate how close to a N(0, 1) are the distributions for all N on the tails. 
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FIG. 3: Binary Graphs. Moments of Sb vs. 
N. 



FIG. 4: Weighted Graphs. Moments of Sw 
vs. N. 





Rescaled S 



Rescaled S 



FIG. 5: Binary Graphs. Estimated distribu- 
tion of S B vs. N. The JV(0, 1) fit is also 
shown clS cl solid line. 



FIG. 6: Weighted Graphs. Estimated distri- 
bution of Sw vs. N. The iV(0, 1) fit is also 
shown solid line. 



Notice finally that as iV increases, the distribution maintains a constant second mo- 
ment but the range increases linearly with N, see [7] and El The lower bound (LB) and 
the upper bound (UB) indeed read approximately: 



LB*(N) 



m*(N) 
' s*{N) 



,UB m (N) 



1 -m„(iV) 
s*{N) 



(15) 
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Upper_Bound = 0.579307*N - 0.195911 
R 2 = 1.000000 



O Lower Bound 
Upper Bound 



Lower_Bound = -0.579232*N + 0.13728 
R 2 = 1.000000 
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Upper_Bound = 1.657477*N + 5.50168 
R 2 = 0.999958 



O Lower Bound 
% Upper Bound 



o 



Lower_Bound = -0.552350*N - 1.176737 
R 2 = 0.999960 
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FIG. 7: Binary Graphs. Lower and upper 
bounds of the re-scaled index Sb vs. N, to- 
gether with the OLS fit. 



FIG. 8: Weighted Graphs. Lower and up- 
per bounds of the re-scaled index Sw vs. N, 
together with the OLS fit. 



where {*} = {B, W} stands for binary (B) and weighted (W). Since the standardized 
index is well approximated by a N(0, 1) for all N, this means that extreme values 
become more and more unlikely. This is intuitive, because as iV grows the number of 
matrices with highest /lowest values of the index are very rare. 



IV. EXAMPLES 

The index developed above can be easily employed to assess the extent to which link direc- 
tionality matters in real-world networks. Let us suppose to have estimated a N x N matrix 
X = {x^} describing a binary (B) or a weighted (W) graph. We then compute the index: 



S.(X) 



N+l 
N- 



\S(X)-m*(N) 



s*(N) 



j>i \ X ij •' ./' 



s*(iV) 

2 



j M x ij 



m*(N) 



(16) 
(17) 



where {*} = {B, W} and (m*(iV), s*(N)) are as in eqs. I91TT21 Since we know that S*(X) is 
approximately N(0, 1), we can fix a lower threshold in order to decide whether the network 
is sufficiently (un)directed. For instance, we could set the lower threshold equal to (i.e. 
equal to the mean), and decide that if S*(X) > (above the mean) we shall treat the 
network as directed (and undirected otherwise). More generally, one might set a threshold 
equal to x G R and conclude that the graph is undirected if S 1 * < x. On the contrary, one 
should expect the directional nature of the graph to be sufficiently strong, so that a digraph 
analysis is called for. 



S 



To test the index against real-world cases, we have taken the thirteen social and economic 
networks analyzed in [3|, see Tabled [20!]. All networks are binary and directed, apart from 
Freeman's ones (which are weighted and directed) and Padgett's ones (which are binary 
and undirected). Table fl] reports both the index S and its standardized versions S 1 *, {*} = 
{B, W}, for all cited examples. 



TABLE I: The index S and its standardized version Sr*y, {*} = {B(inary), W(eigthed)} for social 
networks studied in Q], cf. Chapter 2.5. 





Social Network 




N 


S 




1 


Advice relations btw Krackhardt's hi-tech managers 


21 


0.521327 


0.491228 


2 


Friendship relations btw Krackhardt's hi-tech managers 


21 


0.500813 


0.004610 


3 


"Reports-to" relations btw Krackhardt's hi-tech managers 


21 


0.536585 


0.860033 


4 


Business relationships btw Padgett's Florentine families 


16 


0.000000 


-9.232823 


5 


Marital relationships btw Padgett's Florentine families 


16 


0.000000 


-9.232823 


6 


Acquaintanceship among Freeman's EIES researchers (Time 1) 


32 


0.109849 


-10.025880 


7 


Acquaintanceship among Freeman's EIES researchers (Time 2) 


32 


0.094968 


-11.143250 


8 


Messages sent among 


Freeman's EIES researchers 


32 


0.014548 


-17.181580 


9 


Country Trade Flows 


Basic Manufactured Goods 


24 


0.260349 


-6.643695 


10 


Country Trade Flows 


Food and Live Animals 


24 


0.311966 


-5.217508 


11 


Country Trade Flows 


Crude Materials (excl. Food) 


24 


0.272560 


-6.306300 


12 


Country Trade Flows 


Minerals, Fuels, Petroleum 


24 


0.403336 


-2.692973 


13 


Country Trade Flows 


Exchange of Diplomats 


24 


0.080208 


-11.620970 



Suppose to fix the lower threshold equal to zero. Padgett's networks, being undirected, 
display a very low value (in fact, the non standardized index is equal to zero as expected). 
The table also suggests to treat all the binary trade networks as undirected. The same advice 
applies for Freeman's networks, which are instead weighted. The only networks which have 
an almost clear directed nature (according to our threshold) are Krackhardt's ones. In that 
case our index indicates that a directed graph analysis would be more appropriate. 

V. CONCLUDING REMARKS 

In this paper we have proposed a new procedure that might help to decide whether an 
empirically-observed adjacency or weights N xN matrix W, describing the graph underlying 
a social or economic network, is sufficiently symmetric to justify an undirected network 
analysis. The index that we have developed has two main properties. First, it can be applied 
to both binary or weighted graphs. Second, once suitably standardized, it distributes as a 
standard normal over all possible adjacency /weights matrices. Therefore, given a threshold 
decided by the researcher, any empirically observed adjacency /weights matrix displaying a 
value of the index lower (respectively, higher) than the threshold is to be treated as if it 
characterizes an undirected (respectively, directed) network. 

It must be noticed that setting the threshold always relies on a personal choice, as also 
happens in statistical hypothesis tests with the choice of the significance level a. Despite 
this unavoidable degree of freedom, the procedure proposed above still allows for a sufficient 
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comparability among results coming from different studies (i.e. where researchers set differ- 
ent threshold) if both the value of the index S and the size of the network are documented 
in the analysis. In that case, one can easily compute the probability of finding a matrix 
with a lower/higher degree of symmetry, simply by using the definition of bounds (see eq. 
[TBI) and probability tables for the standard normal. 

A final remark is in order. As mentioned, our procedure does not configure itself as 
a statistical test. Since the researcher often relies on a single observation of the network 
under study (or a sequence of serially-correlated network snapshots through time), statistical 
hypothesis testing will be only very rarely feasible. Nevertheless, in the case where a sample 
of M i.i.d. observations of W is available, one might consider to use the the sample average 
of the index (multiplied by vM) and employ the cental limit theorem to test the hypothesis 
that the observations come from a undirected (random) graph. 
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APPENDIX A: PROOF OF LEMMA [J 



Points (1) and (2) simply follow from the definition in eq. As to (3), let us suppose that 
there exists a matrix A satisfying the above restrictions and such that S(A) > Then, 
using eq. [6j 

n + E,E^ < iv + i- [ } 

The best case for such an inequality to be satisfied is when the the left hand side is minimized. 
This is achieved when there are N(N — l)/2 entries equal to one and N(N — l)/2 entries 
equal to zero in such a way that ^ aji for all i ^ j (e.g., when the upper diagonal matrix 
is made of all ones and the lower diagonal matrix is made of all zeroes). In that case the 
left hand side is exactly equal to tttt, leading to the absurd conclusion that j^-r < jrrr- 



APPENDIX B: PROOF OF LEMMA d 

It follows from the definition of d(A) that: 



d(A) = N(N - 1) " N(N-l) ' (B1) 



Moreover, it is easy to see that: 



h ,,s Yi Yj^i a ij a ji 2 Yi Y.j>i a ij a ji fu s 

h{A) = N(N-1) = iV(iV-l) • (B2) 



To prove the Lemma, it suffices to note that: 



~ iV + 2 y . y . ai,-a, 7 - 

S(A) = 1 \ 1 = (B3) 



_ 1 + (N — l)b(A) d(A) - 6(A) 

1 + (N - l)d(A) (N - l)- 1 + d(A) ' 1 ' 
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